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Abstract — We suggest new recursive formulas to compute the 
exact value of the Kullback-Leibler distance (KLD) between two 
general Hidden Markov Trees (HMTs). For homogeneous HMTs 
with regular topology, such as homogeneous Hidden Markov 
Models (HMMs), we obtain a closed-form expression for the KLD 
when no evidence is given. We generalize our recursive formulas 
to the case of HMMs conditioned on the observable variables. 
Our proposed formulas are validated through several numerical 
examples in which we compare the exact KLD value with Monte 
Carlo estimations. 

Index Terms — Hidden Markov models, dependence tree mod- 
els, information entropy, belief propagation, Monte Carlo meth- 
ods 

I. Introduction 

HIDDEN Markov Models (HMMs) are a standard tool in 
many applications, including signal processing (TJ, 0, 
speech recognition (3), and biological sequence analysis 
0. Hidden Markov Trees (HMTs, also called "dependence 
tree models"), generalize HMMs on tree topologies, and are 
used in different contexts. In texture retrieval applications, they 
model the key features of the joint probability density of the 
wavelet coefficients of real-world data J6). 

In estimation and classification contexts it is often necessary 
to compare different HMMs (or HMTs) through suitable 
distance measures. A standard (asymmetric) dissimilarity mea- 
sure between two probability density functions p and q is the 
Kullback-Leibler distance defined as J7]: 



D{p\\q) = 




An exact formula for the KLD between two Markov chains 
was introduced in [8 |. Unfortunately there is no such a closed- 
form expression for HMTs and HMMs, as pointed out by 
several authors 0, 0, iflOl 

To overcome this issue, several alternative similarity mea- 
sures were introduced for comparing HMMs. Recent examples 
of such measures are based on a probabilistic evaluation of the 
match between every pair of states IflOl . HMMs' stationary 
cumulative distribution [11] and transient behavior fl2l . Other 
approaches are discussed in ||T3l . 04). 

When it is mandatory to work with the actual KLD there 
are only two possibilities: 1) Monte Carlo estimation; 2) 
various analytical approximations. The former approach is 
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easy to implement but also slow and inefficient. With regards 
to the latter, Do [9| provided an upper bound for the KLD 
between two general HMTs. Do's algorithm is fast because 
its computational complexity does not depend on the size of 
the data. Silva and Narayan extended these results in the case 
of left-to-right transient continuous density HMMs IT5l . 0. 
Variants of Do's result were discussed to consider the emission 
distributions of asynchronous HMMs (in the context of speech 
recognition) [16| and marginal distributions ifTTl . 

In this paper, we provide recursive formulas to compute the 
exact KLD between two general HMTs with no evidence. In 
the case of homogeneous HMTs with regular topology, we 
derive a closed-form expression for the KLD. In the particular 
case of homogeneous HMMs, this formula is a straightforward 
generalization of the expression given for Markov chains in 
1 8 1. It turns out that the KLD expression we suggest is exactly 
the well known bound introduced in : as a consequence, the 
latter is not a bound but the actual value of the KLD. At last, 
we generalize our recursive formulas to compute the KLD 
between two HMMs conditioned on the observable variables. 
We validated our models by comparing the exact value of 
the KLD with Monte Carlo estimations in the following 
cases: 1) HMTs with no evidence; 2) HMMs with arbitrarily 
given evidence; 3) HMMs with no evidence. For comparison 
purposes, we experimented with the same sets of parameters 
as in the examples of 0. 

II. Hidden Markov Trees 

A. The model 

In a HMT, each node is either a hidden variable S u or an 
observable variable X u . Only hidden variables have children. 
We denote as Sq the root of the tree and as S u the parent of 
X u , see Figure [T] The joint probability distribution over all 
the variables of the model factorizes as 

F(X,S) = ¥(S $ )F(X^)l[nS n \S va[eat(u) )¥(X u \S u ). 

u 

We denote each index u by a (finite) concatenation of 
characters belonging to a given finite and nonempty ordered 
set V — {0, 1, 2,3,.. .}. In particular, u is a regular expression 
belonging to {0} U V\ U V 2 U . . . U Vat-i, where is the null 
string, Vi+i = {ua\u € Vi, a € V}, and where N is the tree 
depth. Using this notation, S v is a children of S u if and only 
if there exists a £ V such that v = ua. In the binary example, 
shown in Figure [I] V = {0, 1} and N - 1 = 2. 
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Fig. 1. Formalism for HMTs, binary tree with N = 3. 



The parameters of the model are P(X U = x\S u — s) = 
e u (s,x) (emissions), F(S ua — s\S u = r) = ir ua (r,s) with 
a E V (transitions), and P(S , = s) = Q We denote the 
set of parameters by 9; Fg denotes a probability distribution 
under 9. 

In the applications, we are often interested in F(S\X = 
x) = P(X, S\£) where £ = {X — x} is the evidence. Note 
that the notion of evidence can be generalized so to consider 
any subsets X, S of the sets of all possible outcomes of X 
and S: £ — {X G X ,S G S}. For ease of notation, we 
consider only the cases when either no evidence is given or 
the evidence is £ = {X = x}. We will explicitly develop the 



latter case only for HMMs, see Section III-B however it is 



easy to extend our results to the more general case of HMTs. 



B. Recursive formulas for exact KLD computation 

We derive recursive formulas for computing 
the exact Kullback-Leibler distance D(9i\\9q) = 
D{F gi (X,S)\\Ve Q (X,S)) between two HMTs having 
the same underlying topology T and two distinct sets of 
parameters 8i,9 . 

Definition 1: Given an index u and a G V, consider the 
variables {X ua -, S ua -} in the subtree T ua of T rooted at 
S ua (e.g. Xonoi is in the subtree T n rooted at Sou, here 
u = 01, a = 1 and — = 01). We define the inward quantity 
K ua -). U (S U ) as the KLD between the conditional probability 
distributions of {X ua - , S ua - } given S u , with parameters 9\ 
and 9q respectively: 



D[F ei (X ua -,S ua ^\S u )\\F eo (X n 



,S ua ^\S u )} (1) 



Theorem 1: 



K-ua^u \Su) 



Fe x {X ua , S ua \S U ) (log 



x ua ,s ua 



9i {X ua , S ua | <Stt ) 
?0O {X U a ; Sua | S u ) 



J2 K 



uab—^ua 



(Sua) (2) 



with the convention that when X ua is a leaf, with a G V, we 
have Kuab^ua(Sua) = for each b G V. Moreover, 

D(0 1 \\e o ) = 



a£V 



(3) 



C. Homogeneous trees with constant number of children 

When the tree is homogeneous and the nodes S have the 
same number of children (e.g. when T is binary as in Figure 
[TJ, Eqs. Q and Q can be further simplified: 

Corollary 2: Suppose that the transition and emission prob- 
abilities are the same across the whole tree and each variable 
of type S has exactly C children of type S, then for each 
a, a' G V: K ua _> u (S u ) = K ua >-> U (S U ). In particular, if X ua 
is not a leaf, then for each a G V: 

K-ua—>u(Su) — 

k(S u ) + Cj2^e 1 {Su \Su)Kuoo^u {Su ), (4) 

where k(S u = r) = D[F 6l (X 0) 5 O |5 = r)\\Fg (X , S a \S^ = 
r)] = k(r). Moreover 

£>(0i||0q) - h + CY,We 1 (S$)Ko^$(S$), (5) 

where fc = D[F 9l (X 0! S 9 )\\Fg Q (X 9 , 5 )]. 

By writing fi,k,tv as a row, a column and a square 
matrix respectively, we obtain the following closed formula: 

D(e 1 \\e ) = 



h + {CI + C' z tt 6i + C 3 iz 2 9i 



c 



N-1N-2- 



k, (6) 



where N is the depth of the tree, J the identity matrix and 
each node of type S has exactly C children of type S. 



where ua— is reduced to ua in the particular case when X ua 
is a leaf of the tree. 

Our first results are the following simple formulas that make 
it possible to compute the inward quantities and the (exact) 
KLD recursively (proofs in the Supplementary Material): 

'For the sake of simplicity, we consider discrete variables, however it is 
straightforward to extend our results to the case of continuous variables, an 
example is in the Supplementary Material. 



III. Hidden Markov Models 

With reference to the notations used in the previous section, 
a HMM is a HMT in which each variable of type S has 
only one child of type S (i.e. C = 1). In particular we can 
rename the variables so that S = Si-n is the hidden (Markov) 
sequence and X = Xi.n is the sequence of observable 
variables. In the homogeneous case, the parameters of the 
model are fj,(s) = F(Si = s), n(r,s) = P(Sj = s|S*_i = r), 
e(s,x)=P(X i = x\S i = s). 
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A. No Evidence 

When the variables Xi-,n are not actually observed (that is, 
there is no evidence in the model), all the formulas derived in 
the general case of trees continue to hold. Eq. (|6]l gives the 
KLD between two homogeneous HMMs when no evidence is 
given: 

£>(0i||0o) - h + n Bi (I + tz 6i + . . . + <~ 2 )fc, (7) 

where k$ and k are defined exactly as in the previous section. 
Note that this formula is a straightforward extension of the 
results for Markov chains proved in Theorem 1, |8|. Moreover, 
it can be proved that the closed-form expression in Eq. (jAj) is 
exactly the bound given in Eq. (19) [9|, see the Supplementary 
Material for the details. 

Let v be the stationary distribution of ng 1 , then fig^^k 
converges towards vk for large i. From Eq. ((A}, by simply 
computing a Cesaro mean limit, we otbain the KLD rate 

S(*||flo):= lim W=^. (8) 

As observed in |9'[^J vk can be computed in constant time with 
N whereas the exact closed formula of Eq. ([A} is computable 
in O(N) with a direct implementation, or in 0(log 2 (A r )) 
with a more sophisticated approach (see the Supplementary 
Material for the details). 

B. Xs observed 

Now we assume that the variables of type X are actu- 
ally observed, as it is often the case in practice. In par- 
ticular, we consider the evidence £ — {Xi-jy = xi-jy} 
and we want to compute D(W> 8l (X,S\£) ||Pe (X, S\£ )) = 
D(V ei (S\£)\\Ve (S\£)). 

For the sake of simplicity, we can denote the inward quantity 
indexed by i + 1 — > i simply as Kf(Si). Eqs. Q and ^ 
become: Kf_ 1 (S i ^i) = 

fori = n,...,2; D(P 6l (S\£)\\¥e (S\£)) = 

oi 

The conditional probabilities ¥(Si\Si-i, £) are computed 
recursively Q: for instance, one can consider the backward 
quantities Bi{s) — V(X i+ i.^ = X; + i : Ar|5j = s). In the 
homogeneous cas^] these are computed recursively from 
B n (s) = 1 with Bi_i{r) = ^ s 7r(r, s)e(s,Xi)Bi(s), for i = 
n, . . . , 2. Then we obtain the following conditional probabili- 
ties: V(S l = s\Si-i =r,£) = TT(r,s)e(s,x i )B i (s)/B i - 1 (r), 
and P(5i = s\£) oc v(s)e(s, Xi)Bi(s). 

~vk is exactly the bound for the KLD rate given in (9)- 

3 In the heterogeneous case we can classically derive similar formulas. 



TABLE I 

HMTS WITH NO EVIDENCE, EXACT KLD = 0.690. 



Trials 


MC 


95% CI 


10^ 


0.752 


[0.580,0.925] 


10^ 


0.673 


[0.616,0.730] 


10 4 


0.691 


[0.673,0.709] 


10 b 


0.690 


[0.684, 0.696] 


10 b 


0.688 


[0.687, 0.690] 



IV. Numerical Experiments 

We ran numerical experiments to compare our exact formu- 
las with Monte Carlo approximations. 

HMTs, no evidence. We compared the exact value and 
Monte Carlo estimations of the KLD for the pair of trees 
considered in (9). In these trees, the variables of type X are 
mixtures of two zero-mean Gaussians: we can easily adapt 
Eq. (|2|i to this case as shown in the Supplementary Material. 
The exact value of the KLD is 0.690. The results in Table U 
show that an important number of simulations is necessary for 
the MC estimations to approximate properly the exact KLD 
value. We computed the bound suggested by Do in [9| and 
obtained a value which is different from the one shown in 
Figure 3 of lf9| : in particular the value of Do's bound turned 
out to be the same as the value of the exact KLD. This 
inconsistency is probably due to a minor numerical issue in 
[9 1 and can be safely ignored because Monte Carlo estimations 
clearly validate our computations. 

HMMs, no evidence. We experimented with the pair of 
discrete HMMs considered in [9|, the two sets of parameters 
can be found in the Supplementary Material. We implemented 
Eqs. @, ^ for computing £>(0i||0 o )/.?v" and the KLDR. For 
Monte Carlo estimations, we ran n = 1000 independent trails 
for each value of N. The results are depicted in Figure [2] 
and show that the proposed recursions for the computation 
of the exact KLD give consistent results with Monte Carlo 
approximations. Moreover the ratio D(9i\\9o)/N converges 
very fast to the KLD rate. Note that these results differ from 
the ones in Figure 2 of [9| where Do's bound (i.e. the exact 
KLD rate) seems not to be attained for A^ = 100. Again, 
Monte Carlo estimations support our computations. 

HMMs with evidence. We considered the same HMMs as 
above with an arbitrarily given evidence £ = {Xi-jy = xi-.n} 
(see the Supplementary Material). Figure [3] shows that the 
exact values of D(P ei (S\\£)\\P(Se \£)) computed with our 
recursions are consistent with Monte Carlo approximations. 
In this case, there is no asymptotical behavior because of the 
irregularity of the evidence. 

V. Conclusion 

The most important contribution of this paper is a new the- 
oretical framework for the exact computation of the Kullback- 
Leibler distance between two hidden Markov trees (or models) 
based on backward recursions. This approach makes it possible 
to obtain new recursive formulas for computing the exact 
distance between the conditional probabilities of two hidden 
Markov models when the observable variables are given as an 
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Methods 
• Exact recursions 
o Monte Carlo, 1000 replicates 
- - Exact KLDR = Bound in [9] 
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Fig. 2. HMMs with no evidence. 95% confidence intervals shown for MC 
estimations. 



Methods 
• Exact recursions 
o Monte Carlo, 1000 replicates 
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Fig. 3. HMMs conditioned on the observable variables. 95% confidence 
intervals shown for MC estimations. 



evidence. When no evidence is given, we derive a closed-form 
expression for the exact value of the KLD which generalizes 
previous results about Markov chains [8]. In the case of HMMs 
this generalization is not surprising as the pairs of hidden 
and observable variables are the elements of a Markov chain. 
However, quite surprisingly, at the best of our knowledge these 
results have not been explicitly derived earlier. 

It can be easily shown that our closed-form expression is 
exactly the bound suggested in [9]: the proof for HMMs is 
given in the Supplementary Material. In (9) a necessary and 
sufficient condition is given for the bound to be the exact value 
of the KLD. We argue that the suggested bound is the exact 



value even if this condition is not satisfied (a simple numerical 
counterexample is given in the Supplementary Material). The 
reason why the exact value of the KLD is considered as an 
upper bound in [9| seems to be an inappropriate use of the 
equality condition in Lemma 1 [9|. Indeed this condition is 
certainly sufficient but not necessary (because J f — J g 
does not imply / = g). At last, we observe that the main 
difference between our formalism and the one in J9] is that 
we suggest new recursions to compute the KLD, whereas in 
|9| the standard backward quantities for HMTs and HMMs 
are used. 
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Appendix 

Supplementary Material with Technical Details 

Proofs of Results in Section II 

We give detailed proofs of some of the results from Sec- 
tion II in the main paper. 

Proof of Theorem 1: Eq. (2) in the main paper is obtained 
from Eq. (1) by observing that P(X ua - , S ua -\S a ) = 

\S u )P(X uab ^,S uab ^ for all b e V\S ua ), 

and that {j beV {X ua b-, S uab -} is a partition of 
{X ua — , S ua — } {^Mj S ua \. I 

In order to prove Corollary 2, first we prove the following 
lemma: 

Lemma 3: If the transition and emission probabilities are 
the same across the whole tree, then: 

K ua ^). u (S u ) — k U a(S u )-\- ' > } Pg 1 (S U g \S U ) ^ K U ab— ¥ua (S U a) , 

Sua bev 
where k ua (S u ) does not depend on u and a: k ua (S u = r) = 

■K 9l {r,s)e Bl {s,x) _ 



^7r ei (r, s)e Bl (s, x)log 



we (r,s)ee (s,x) 



D[¥ ei (X ,S \S $ = r)\\¥ eo (Xo,S \S $ = r)] = k(r). More- 
over 

0(0i||0o) = h + E K ^( s ®)> 



a£V 



where 



D[P fll (X ,S )||P flo (X 0) 5 )]. 



He 1 (s)ee 1 {s,x) 
Ve (s)e eo (s,x) 



Proof of Lemma 1: We only prove the first equation. 
Because of Eq. (2): K ua ^ u (S u ) = 

ST TO ( V O \C ^ ]niJ Po 1 (Xua,S ua \S u ) 

x ua ,s ua 

^ ' Pfll (^iiaj 'S'tia I'S'm) X/fcgy K uab -+ua (S ua ) 

The first term in this sum does not depend on u and a since 
the transition and emission probabilities are constant; it is 
straightforward to obtain its expression k(-). The second term 
is equal to 



(Sua) (-^ua| Sua) ' 

E^-i^E 

X uab — } ua (Sua ) ■ 



Now suppose that K uab ^. ua (S ua ) = K ua b>^ua(Sua) 
\/a,b,b' E V and Vu of a given length m (inductive step). 
In particular, for each a, b £ V and u of length m, we have 

Kuab^ua 

(S ua ) = Kuoo^uo(Suo)- It is now easy to see that 
Kua^u(S u ) = Kua'^u(Su) for each a, a' € V and u of 
length m: by the lemma above 



K ua ^u(S u ) = k(S u )+'^2P9 1 (Sua\Su) ^ Xuab^rua (Sua) 

Sua bev 



k(S u ) + } Pe 1 (S u o\S u ) / ] ^«oo^«o('S'«o) 



S„ 



bev 



k(Su) + cY.PoASuolS^Kuoo^Suo)- 



Comparison with ||9j 

We show that the bound suggested by Do in [9 1 is the actual 
value of the KLD. For the sake of simplicity we will only 
consider HMMs, however it is straightforward to generalize 
the following to more general HMTs. 

The closed form expression for the exact value of the KLD 
between HMMs (no evidence) is 

D(9i\\e Q ) = fc + H 9l (I + ir 6l + ■ • . + <~ 2 )fc. 

For comparison purposes, we rewrite /c and k as 

fc = D(n 01 \\fj, 6o ) + ti ei D(e ei \\e eo ) = D(n) + fJ, 8l D(e) 

k = £>(7r ei ||7r eo ) + Tv ei D(e ei \\e go ) = D(n) + Tv 01 D(e), 

where the jth component of the vector D(e) :— D(es 1 \\es g ) 
is D(ee 1 (j, -)\\eg (j, •)), and similarly the j'th component of 
D (^) : = D^OiW^eo) is D(Tv gi (j,-)\\Tr eg (j,-)). The reader 
should not confound Do's symbol e, which is D(e) in our 
notations, with our emission matrix e. Moreover Do's vector 
d becomes D(ir) + D(e) in our notations. 

Using these notations, Do's upper bound in the case of 
HMMs - Eq. (19) in (9) - is U = 



'N-l 



bev 



Proposition 4: D(9i\\9 ) = U. 
Proof: 

D(6 1 \\6o) = D(») + » ei D(e) + 

H 6l (I + 7v ei +... + 7r£- 2 )(D(7r) + n ei D(e)) = 

D(P) + n 0l ( E < X l D {-x) + ^( e )l + <"^( e )) • 



Proof of Corollary 2: The key point here is to prove 
that for each a, a' e V: K ua ^u(S u ) = K ua '^u(S u ); we will 
do it by induction on the levels of the tree. By definition of 
inward quantity and by the lemma above, if X ua is a leaf, 
with a e V, then K ua ^ u (S u ) = k(S u ) for each a. 



In 13 it is explained that D(9 1 \\9 ) = U if and only if 

V Sl (S = s\X = x) = Vg (S = s\X = x), for all s,x. 

We observe that this condition is not fulfilled in general, 
whereas Z?(6*i | |^o) = U is always true as shown above. For 
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instance, consider the HMMs of length 10 with the same 
parameters as in Eq. (22) (9|. For the arbitrarily fixed 

8= (1,1,1,1,1,2,2,2,2,2) 
x = (1,1,1,2,2,2,3,3,3,3) 



we have V 6l (S = s\X = 
0.10 and D(0i||0 o ) = U 



x) = 0.91, F 9o (S 
= 0.071. 



s\X = x) 



Computation of YliLo 2 7r ** 

When considering HMMs with no evidence, the exact KLD 
expression involves a term of the form J^iLa* where tt 
is a stochastic matrix (of order d, where d is the number of 
hidden states), and k is a column-vector. Note that, because tt 
is stochastic, I — tt is not invertible. Is it possible to compute 
this sum with a complexity smaller than 0(d 2 N)l The answer 
to this question is indeed "yes", but a little bit of linear algebra 
is required. 

Let us assume that there exists P = (vi, . . . , v<j) a basis 
of (column-) eigenvectors of tt such that it = PDP -1 , 
where D = diag(Ai, . . . , A<j) is the diagonal matrix of the 
corresponding eigenvalues. For the sake of simplicity, we 
assume that Ai = 1.0 and that |A 3 | < 1 if j ^ 1 (for example, 
this is true if tt is primitive, which means that 3i such as 
tt % > 0). Nevertheless, the following method can be easily 
extended to the case when the eigenvalue 1.0 has a multiplicity 
greater than 1. 

By defining the invertible matrix tt = 
Pdiag(0, A2 . . . , Ad)P _1 and decomposing k with respect to 
the eigenvector basis as k — k±vi + k, we obtain 



It follows that 

N-2 
i=Q 



nk = fciVi + 7rfc. 



N-2 

(iV-l)fciVi + 

(N - l)fc lVl + (J - 7} N ^)(I - n)^k 



which can be computed in 0(d 3 log 2 N) by obtaining 7? 1 
through a binary decomposition of N — 1. 

Numerical Experiments 
HMMs with no evidence 

We considered the same set of parameters as in Eq. (22) |9). 
In our notations: 



e 01 



(0.5 0.5) 

0.9 0.1 

0.2 0.8 

0.1 0.3 0.6 

0.2 0.1 0.7 



Men 



(0.5 0.5) 
0.7 0.3 
0.4 0.6 
0.3 0.5 0.2 
0.6 0.2 0.2 



The stationary distribution of TTg 1 is v — (2/3 1/3): 
vttq 1 = v and fi ei TTg —> v for large i. 



HMMs with evidence 

For N — 100 we took as evidence the vec- 
tor 2!i : ioo where: 1) all the components with positions 
[1, 10], [31,40], [61, 70], [91, 100] are equal to 1; 2) the com- 
ponents with positions [11, 20], [41, 50], [71, 80] are equal to 
2; 3) the components with positions [21,30], [51,60], [81,90] 
are equal to 3. For 5 < N < 95, the components of x\-m are 
the first N values of a^ioo- 

HMTs, no evidence 

We considered the same HMTs as in Eq. (23) 0. All the 
S nodes belonging to the same level have the same set of 
parameters. In our notations: 



»6i 
7T° 



(0.69 0.31) 

0.99 0.01 

0.22 0.78 

0.99 0.01 

0.32 0.68 



TT 



»e 

. 



(0.63 0.37) 

0.98 0.02 

0.20 0.80 

0.99 0.01 

0.22 0.78 



Each emission probability distribution P(X U \S U ) has a zero- 
mean Gaussian density with standard deviation depending on 
S„ as follows: 





= 11.8,^(2) = 


= 67.1 




-24.6,a e o (2) 




= 4.1,<(2) = 


29.3 


<M) = 


6-9, < (2) = 




= 2.8,a°°(2) = 


10.3 


<°(D = 


3-l,<(2) = 



14.8 

For instance, the probability density function f$ 1 (Xio\Sio) is 
Af(0, ^(l)) if S w = 1 and Af(0, a™ (2)) if S w = 2. 
Eq. (2) becomes K ua ^ u (S u ) = 



/ fe 1 (X ua \S ua )log 



Sua j 



Sua 



fe (X ua \S ua ) 

6>i {S U a\S u ) 



bev 



ia (*SVia) 



log 



D [Af(0, *^(S ua ))\\Af(0, <i^(S ua ))] + £ K uab ^ ua (S ua ) 

bev 

where a € V. If X ua is a leaf then K uab ^ ua (S ua ) = 0. If 
X ua is a not leaf then K U ab^,ua does not depend on b € V 
and therefore 

^ ^ Kuab— ¥ua (^ua) 2 • K ua Q^ ua {S ua ) . 

bev 

Similarly, one can obtain the formula for the KLD. 

At last, we recall that the KLD between two Gaussians can 
be computed with the well known formula 



D(Af(fJ.i,(Ti)\\Af(fii,(Ti)) = 

a{ + (px - ^o) 2 



2al 



log 



ao _ 1 
a x 2' 



